o 



arXiv: 1203.5753 



POSTERIOR CONTRACTION RATES FOR THE BAYESIAN 
APPROACH TO LINEAR ILL-POSED INVERSE PROBLEMS 

By Sergios Agapiou* 

Stic Larsson"!" 

AND Andrew M. Stuart* 

University of Warwick * and Chalmers University of Technology ^ 



We consider a Bayesian nonparametric approach to a family of 

linear inverse problems in a separable Hilbert space setting with 

11^ Gaussian noise. We assume Gaussian priors which are conjugate to 

, the model and present a method of identifying the posterior using its 

^ precision operator. Working with the unbounded precision operator 

f*^ enables us to use partial differential equations (PDE) methodology 

to obtain rates of contraction of the posterior distribution to a Dirac 
^^ measure centered on the true solution. Our methods assume a rel- 

^^ atively weak relation between the prior covariance, noise covariance 

^.^ and forward operator. 

H 

r 1. Introduction. The solution of inverse problems provides a rich source of 

4-J applications of the Bayesian nonparametric methodology. It encompasses a wide 

^ range of applications from partial differential equations (PDEs) [2] , and there is a 

I '~'i well-developed theory of classical, non-statistical, regularization [7]. On the other 

hand, the area of nonparametric Bayesian statistical estimation and in particular 

^J the problem of posterior consistency has attracted a lot of interest in recent years; 

(^ see for instance [9, 22, 21, 24, 25, 10, 6]. Despite this, the formulation of many 

'^ of these PDE inverse problems using the Bayesian approach is in its infancy [23]. 

{y«^ Furthermore, the development of a theory of Bayesian posterior consistency, analo- 

• gous to the theory for classical regularization, is under-developed with the primary 

(^ contribution being the recent paper [14] . This recent paper provides a roadmap for 

CN what is to be expected regarding Bayesian posterior consistency, but is limited in 

. . terms of applicability by the assumption of simultaneous diagonalizability of the 

_ ^ three linear operators required to define Bayesian inversion. Our aim in this paper 

S^ is to make a significant step in the theory of Bayesian posterior consistency for 

H linear inverse problems by developing a methodology which sidesteps the need for 

simultaneous diagonalizability. The central idea underlying the analysis is to work 

with precision operators rather than covariance operators, and thereby to enable 

use of powerful tools from PDE theory to facilitate the analysis. 

Let A' be a separable Hilbert space, with norm ]] • ]] and inner product (•, •), and 
let A: T^{A) C A" — 7- Af be a known self-adjoint and positive-definite linear operator 
with bounded inverse. We consider the inverse problem to find u from y, where y 
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2 S. AGAPIOU, S. LARSSON AND A.M. STUART 

is a noisy observation of A~^u. We assume the model, 

(1.1) y = A-\+^C, 



where -^^ is an additive noise. We wih be particularly interested in the small noise 
limit where n — )• oo. 

A popular method in the deterministic approach to inverse problems is the gener- 
alized Tikhonov-Phillips regularization method in which u is approximated by the 
minimizer of a regularized least squares functional: define the Tikhonov-Phillips 
functional 



In.-i, _i ,||2 A 



1 



(1.2) Jo{u) := -\\C, \y - A-'u)\Y + -\\e^ 'u\\\ 

where d: X ^^ X , z = 0, 1, are bounded, possibly compact, self-adjoint positive- 
definite linear operators. The parameter A is called the regularization parameter, 
and in the classical non-probabilistic approach the general practice is to choose it 
as an appropriate function of the noise size n~2 ^ which shrinks to zero as n — )• cxd, 
in order to recover the unknown parameter u [7]. 

In this paper we adopt a Bayesian approach for the solution of problem (1.1), 
which will be linked to the minimization of Jq via the posterior mean. We assume 
that the prior distribution is Gaussian, u ~ /xq = AA(0,r^Co), where r > and 
Co is a self-adjoint, positive-definite, trace class, linear operator on X. We also 
assume that the noise is Gaussian, ^ ~ AA(0,Ci), where Ci is a self-adjoint positive- 
definite, bounded, but not necessarily trace class, linear operator; this allows us 
to include the case of white observational noise. We assume that the, generally 
unbounded, operators Cq and Cj" , have been maximally extended to self-adjoint 
positive-definite operators on appropriate domains. The unknown parameter and 
the noise are considered to be independent, thus the conditional distribution of 
the observation given the unknown parameter u is also Gaussian with distribution 

Define A = -^ and let 

ff ,, 1 -, ,.o 111 — iiO 

(1.3) J(n) = nJoiu) = -\\C, ^y - ^"MH + ^\K '^|| • 

In finite dimensions the probability density of the posterior distribution with respect 
to the Lebesgue measure is proportional to exp (— J(n)). This suggests that, in the 
infinite-dimensional setting, the posterior is Gaussian fj.^ = J\f{m, C), where we can 
identify the posterior covariance and mean by the equations 

(1.4) C-^ = nA-^C^^A-^ + ^Cq-i 
and 

(1.5) -C-^m = A'^C^^y, 
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obtained by completing the square. We present a method of justifying these ex- 
pressions in Section 4. We define 

(1.6) Bx = -C-^ = A-^C7^A-^ + ACfT^ 

n 

and observe that the dependence of Bx on n and r is only through A. Since 

(1.7) Bxm = A-^C^'y, 

the posterior mean also depends only on A: m = mx- This is not the case for the 
posterior covariance C, since it depends on n and r separately: C = Cx,n- In the 
following, we suppress the dependence of the posterior covariance on A and n and 
we denote it by C 

Observe that the posterior mean is the minimizer of the functional J, hence also 
of Jo, that is, the posterior mean is the Tikhonov-Phillips regularized approximate 
solution of problem (1.1), for the functional Jq. 

In [18] and [16] , formulae for the posterior covariance and mean are identified in 
the infinite-dimensional setting, which avoid using any of the inverses of the prior, 
posterior or noise covariance operators. They obtain 

(1.8) C = t'^Co - t'^CoA'^{A~^CqA'^ + ACi)-U-^Co 
and 

(1.9) m = CQA-^{A-^CoA-^ + \Ci)-^y, 

which are consistent with formulae (1.4) and (1.7) for the finite-dimensional case. 
In [18] this is done only for Ci of trace class while in [16] the case of white obser- 
vational noise was included. We will work in an infinite-dimensional setting where 
the formulae (1.4), (1.7) for the posterior covariance and mean can be justified. 
Working with the unbounded operator Bx opens the possibility of using tools of 
analysis, and also numerical analysis, familiar from the theory of partial differential 
equations. 

In our analysis we always assume that Cq is regularizing, that is, we assume that 
Cq dominates Bx in the sense that it induces stronger norms than A~^C^ A~^. 
This is a reasonable assumption since otherwise we would have Bx — A~^C^ A~^ 
(here ~ is used loosely to indicate two operators which induce equivalent norms; we 
will make this notion precise in due course). This would imply that the posterior 
mean is ttt, ~ Ay, meaning that we attempt to invert the data by applying the, 
generally discontinuous, operator A [7, Proposition 2.7]. 

We study the consistency of the posterior ^^ in the frequentist setting. To this 
end, we consider data y = y' which is a realization of 

(1.10) y^=A-\^ + ^C, e~AA(0,Ci), 

where u^ is a fixed element of X; that is, we consider observations which are per- 
turbations of the image of a fixed true solution u' by an additive noise ^, scaled 
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by -Y= . Since the posterior depends through its mean on the data and also through 
its covariance operator on the scahng of the noise and the prior, this choice of 
data model gives as posterior distribution the Gaussian measure fi^^ = ^{7X1^,0)^ 
where C is given by (1.4) and 

(1.11) Bxm{ = A-^C^^y^. 

We study the behavior of the posterior /i^^^ as the noise disappears (n — t- oo). 
Our aim is to show that it contracts to a Dirac measure centered on the fixed true 
solution m' . In particular, we aim to determine e„ such that 



(1.12) ^^'^ fK,n {^ ■ Ih - ^1 ^ M„e„| -^ 0, VM„ 



oo, 



where the expectation is with respect to the random variable y^ distributed accord- 
ing to the data likelihood N{A~^u\ ^Ci). 

As in the deterministic theory of inverse problems, in order to get convergence in 
the small noise limit, we let the regularization disappear in a carefully chosen way, 
that is, we will choose A = A(n) such that A — )• as n — )• oo. The assumption that 
Cq dominates B\, shows that B\ is a singularly perturbed unbounded (usually 
differential) operator, with an inverse which blows-up in the limit A — )• 0. This 
together with equation (1.7), opens up the possibility of using the analysis of such 
singular limits to study posterior contraction: on the one hand, as A — )• 0, B'^ 
becomes unbounded; on the other hand, as n — )• oo, we have more accurate data, 
suggesting that for the appropriate choice of A = A(n) we can get rrv^ ~ u^ . In 
particular, we will choose r as a function of the scaling of the noise, r = T{n), 
under the restriction that the induced choice of A = \{n) = — }-y2^ is such that 
A — )• as n — )• oo. The last choice will be made in a way which optimizes the rate 
of posterior contraction e„, defined in (1.12). In general there are three possible 
asymptotic behaviors of the scaling of the prior r^ as n — )• oo, [25, 14]: 

i) T^ — )• oo; we increase the prior spread, if we know that draws from the prior 

are more regular than u^; 
ii) r^ fixed; draws from the prior have the same regularity as v)] 
iii) r^ —7- at a rate slower than -; we shrink the prior spread, when we know 

that draws from the prior are less regular than u\ 

The problem of posterior contraction in this context is also investigated in [14] 
and [8]. In [14], sharp convergence rates are obtained in the case where Co,Ci and 
A~^ are simultaneously diagonalizable, with eigenvalues decaying algebraically, and 
in particular Ci = /, that is, the data are polluted by white noise. In this paper we 
relax the assumptions on the relations between the operators Cq,Ci and .4^^, by 
assuming that appropriate powers of them induce comparable norms (see Section 
2). In [8], the non-diagonal case is also examined; the three operators involved are 
related through domain inclusion assumptions. The assumptions made in [8] can 
be quite restrictive in practice; our assumptions include settings not covered in [8], 
and in particular the case of white observational noise. 
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Provided the problem is sufficiently ill-posed and the true solution u^ is suffi- 
ciently regular, in Corollary 6.5 below, we get the convergence in (1.12) for 

7A(A + 1) 

^ — j^ 2{A+7A{A+l)-l+so+e) ^ 

The parameters 7, sq and A measure the regularity of the true solution, the reg- 
ularity of the prior and the ill-posedness of the problem respectively, as explained 
in Sections 2 and 6. Even though we work in a more difficult non-diagonal setting, 
we get rates which, up to e > arbitrarily small, agree with the sharp convergence 
rates obtained in the diagonal case in [14] for a wide range of values of the parame- 
ter 7 (Figure 1). Our rates fail to be optimal when the true solution is too regular, 
in particular our rates saturate earlier, and we also require more regularity from 
the true solution in order to get convergence. Both discrepancies are attributed to 
the fact that our method relies on interpolating between rates in a strong and a 



weak norm of the error e 



m 



u^ ; on the one hand the rate of the error in the 



weak norm saturates earlier and on the other hand the error in the strong norm 
requires additional regularity in order to converge. 




Fig 1. Exponents of rates of contraetion plotted against the regularity of the true solution, 7. In 
blue are the sharp convergence rates obtained in the diagonal case m [14], while m green the rates 
predicted by our method, which applies to the more general non-diagonal case 



In the following section we present our assumptions and their implications. 
In Section 3, we reformulate equation (1.7) as a weak equation in an infinite- 
dimensional space. In Section 4, we present a new method of identifying the pos- 
terior distribution: we first characterize it through its Radon-Nikodym derivative 
with respect to the prior (Theorem 4.1) and then justify the formulae (1.4), (1.7) for 
the posterior covariance and mean (Theorem 4.2). In Section 5, we present operator 
norm bounds for B^ in terms of the singular parameter A, which are the key to 
the posterior contraction results contained in Section 6 (Theorems 6.1 and 6.2 and 
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Corollaries 6.4 and 6.5). In Section 7, we present a nontrivial example satisfying 
our assumptions and provide the corresponding rates of convergence. In Section 8, 
we compare our results to known minimax rates of convergence in the case where 
Co,Ci and A^^ are all diagonalizable in the same eigenbasis and have eigenvalues 
that decay algebraically. Finally, Section 9 is a short conclusion. 

2. The Setting. In this section we present the setting in which we formulate 
our results. First, we define the spaces in which we work, in particular, we define the 
Hilbert scale induced by the prior covariance operator Cq. Then we determine the 
spaces to which draws from the prior, ^q, and white noise belong. Furthermore, we 
state our main assumptions, which concern the connections between the operators 
Co, Ci and A~^ and present regularity results for draws from the prior, fiQ, and 
the noise distribution, AA(0,Ci). Finally we briefiy overview the way in which the 
Hilbert scale defined in terms of the prior covariance operator Co , which is natural 
for our analysis, links to scales of spaces defined independently of any prior model. 

2.1. Assumptions. We start by defining the Hilbert scale which we will use in 
our analysis. Recall that X is an infinite-dimensional separable Hilbert space and 
Co : Af — )• Af is a self-adjoint, positive-definite, trace class, linear operator. Since 
Co : A" — )• A" is injective and self-adjoint we have that X = Tl{Co) (BTI{Cq)-^ = Tl{Co). 
This means that Cq : TZ{Co) — )• A" is a densely defined, unbounded, symmetric, 
positive-definite, linear operator in X. Hence it can be extended to a self-adjoint 
operator with domain 'D{Cq ) := {u ^ X : Cq u ^ A'}; this is the Friedrichs 

extension [15]. Thus, we can define the Hilbert scale (X*)igR, with X^ := M. ' * [7], 
where 



7W := Pi V{Cq^), {u,v)^ := (Cq ^u,Cq ~^v) and \\u\\^ := \\Cq 



t t 

2 



u 



k=0 



The bounded linear operator Ci: X ^ X is assumed to be self-adjoint, positive- 
definite (but not necessarily trace class); thus C^ : Tl{Ci) — ;■ X can be extended 
in the same way to a self-adjoint operator with domain 2?(C|f ) := {u ^ X : 
C^ u £ X}. Finally, recall that we assume that A: '^{A) — t- A' is a self-adjoint and 
positive-definite, linear operator with bounded inverse, A~^ : X ^ X. 

Let {A|,0fc}fcLi be orthonormal eigenpairs of Co in X. Thus, {)^k}'k=i ^"^^ ^^^ 
singular values and {4>k}'^^i an orthonormal eigenbasis. Since Co is trace class we 
have that YlV=i A^ < oo. In fact we make the following stronger assumption: 

Assumption 2.1. There is a ao £ (0,1] such that YlT=i'^'k < °° f^''^ ^^^ 

a < ao. 

We assume that we have a probability space {rt,J^,V). The expected value is 
denoted by E and ^ ~ /i means that the law of the random variable ^ is the measure 

Let fiQ '■= AA(0,r^Co) and Po := -^(0, ^Ci) be the prior and noise distributions 
respectively. Furthermore, let u{du, dy) denote the measure constructed by taking u 
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and y\u as independent Gaussian random variables AA(0, t^Cq) and J\f{A~^u, -Ci) 
respectively: 

i>{du,dy) = F{dy\u)fj,Q{du), 

where P := M{A^^u, ^Ci). We denote by vo{du,dy) the measure constructed by 
taking u and y as independent Gaussian random variables A/'(0, t^Cq) and A/'(0, ^Ci) 
respectively: 

i^oidu, dy) = Po(dy) (Xi fJ-o{du). 

In the following, we exploit the regularity properties of a white noise to determine 
the regularity of draws from the prior and the noise distributions. We consider a 
white noise to be a draw from AA(0, 1), that is a random variable C ~ -^(0, /)• Even 
though the identity operator is not trace class in X, it is trace class in a bigger 
space X~^, where s > is sufficiently large. 

Lemma 2.2. Under the Assumption 2.1 we have: 

II - l|2 

i) Let Q he a white noise. Then E Cq C < oo for all s > sq := 1 — gq. 
a) Let u ^ hq. Then u G X'^ HQ-a.s. for every a < ctq. 

Proof. 

- II - Il2 

i) We have that C^C, ~ AA(0,Cq), thus E||Cq^(^|| < oo is equivalent to Cq being 
of trace class. By the Assumption 2.1 it suffices to have s > 1 — (Tq. 

ii) We have E||C^^m||^ = EUCo^C^^-uJI^ = E||Co"^C||^ where C is a white 
noise, therefore using part (i) we get the result. 

D 

Remark 2.3. Note that as ctq changes, both the Hilbert scale and the decay 
of the coefficients of a draw from fj,Q change. The norms \\-\\t are defined through 

powers of the eigenvalues A|. IfcrQ < 1, then Cq has eigenvalues that decay like k ^o , 

_ 1 ^ 

thus an element u G X* has coefficients (u,(/)fc), that decay faster than k ^ ^"o . 

As (To gsts larger, that is, as sq gets closer to zero, the space X* for a fixed t > 0, 

corresponds to a faster decay rate of the coefficients. At the same time, by the last 

lemma, draws from /uq = M{0,Co) belong to X'^ for all a < ctq. Consequently, as 

ao gets larger, not only do draws from fio belong to X" for larger a, but also the 

spaces X'^ for fixed a reflect faster decay rates of the coefficients. The case o"o = 1 

corresponds to Cq having eigenvalues that decay faster than any negative power of 

k. A draw from fiQ in that case has coefficients that decay faster than any negative 

power of k. 

We now state a number of assumptions regarding interrelations between the 
three operators Cq, Ci and A~^] these assumptions reflect the idea that 

Ci ~ C^ and A-^ ~ C^, 
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for some /3 > 0,£ > 0, where ~ is used in the same manner as in Section 1. This is 
made precise by the inequahties presented in the following assumption, where the 
notation a x 6 means that there exist constants c, c' > such that ca < b < c'a. 

Assumption 2.4. Suppose there exist /3 > 0, i > and constants q > 0,i = 
1, ..,4 such that, for A := 2£ — /3 + 1, we have 

1. A> 2so; 

2. \\C^'^A-^u\\ X ||Co~^u||, Vu e X^-2^; 
Co^C^u\\ < cillCo^ull, Vn G XP-I^, ^p < 13 - sq; 
QC^'^uW < C2||Co^u||, Vn G X^'^ Vs G {sq, 1]; 

_ s _ 1 2i-l3-s 

Co^Ci M-in|| <C3||Co 2 u\\, yueX'+^~^^,yse{so,l]; 
CoM-^Cf ^njl < C4||Co^^^"^n||, Vn G X^/^^^^"'', Vr/ G [/3 - 2^, 1]; 



3. 

4- 

5. 
6. 

where so = 1 — ctq G [0, 1) is given by Assumption 2.1. 



Notice that, by Assumption 2.4(1) we have 21 — 13 > —1 which, in combination 
with Assumption 2.4(2), implies that 

(Ci'M"^n,Ci"M"^n> + A(Cf^^n,C(^^n> < c{C^^u,e^^'u), Vn G X\ 

capturing the idea that the regularization through Cq is indeed a regularization. 
In fact the assumption A > 2sq connects the ill-posedness of the problem to the 
regularity of the prior. The value of 2so becomes larger when Cq is less regular in 
which case we require a bigger value of A, which means a more ill-posed problem. 

Lemma 2.5. Under the Assumptions 2.1 and 2.4 we have: 

i) u£ X*o+'^-2^+^ po-a.s. for all < e < {A - 2so) A ao; 

_ 1 

a) A^^u G 'D{C^ ^) fiQ-a.s.; 
Hi) C e XP Fo-a.s. for all p < P - sq; 
iv) y G XP v-a.s. for all p < /3 — sq. 

Proof. 

i) We can choose an e as in the statement by the Assumption 2.4(1). By Lemma 
2.2(ii), it suffices to show that so + (3 — 2i + e < ctq. Indeed, sq + 13 — 2i + e = 

So + 1 - A + e < 1 - So = (To- 

ii) Under Assumption 2.4(2) it suffices to show that u G X^~'^^. Indeed, by 

Lemma 2.2(ii), we need to show that /3—2£ < ao, which is true since sq G [0, 1) 

and we assume A > 2so > sq, thus 2i — (3 + 1 > sq. 
_ 1 

iii) Noting that (^ = C^ ^^ is a white noise, using Assumption 2.4(3), we have by 
Lemma 2.2(i) 

E\\al = E\\C,k'^C;k\f < cIE||Co^C||' < oo, 
since [3 — p > sq. 
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iv) By (ii) we have that A~^u is no-a.s. in the Cameron-Martin space of the Gaus- 
sian measures P and ¥q, thus the measures P and Pq ^-re fiQ-a.s. equivalent [5, 
Theorem 2.8] and (iii) gives the result. 

D 

2.2. Guidelines For Applying The Theory. The theory is naturally developed in 
the scale of Hilbert spaces defined via the prior. However application of the theory 
may be more natural in a different functional setting. We explain how the two may 
be connected. Let {V'fcjfceN be an orthonormal basis of the separable Hilbert space 
X. We define the spaces X*, t S M as follows: for t > we set 

oo 

X^ ■.= {ueX : Y^ k^\u, iJk)^ < oo} 
fc=i 

and the spaces X~*, t > are defined by duality, X~* := (X*)*. 

For example, if we restrict ourselves to functions on a periodic domain D = [0, L] 
and assume that {V'fclfceN is the Fourier basis of A" = L^{D), then the spaces X* 
can be identified with the Sobolev spaces of periodic functions H^, by rescaling: 
H^ = x'd [20, Proposition 5.39]. 

In the case (Tq < 1, as explained in Remark 2.3 we have algebraic decay of the 

_ j_ 

eigenvalues of Co and in particular A| decay like k "o . If Cq is diagonalizable in 
the basis {4>k}k<mi that is, if 4)k = ipk, k £ N, then it is straightforward to identify 
the spaces X* with the spaces X^'o. The advantage of this identification is that 
the spaces X* do not depend on the prior so one can use them as a fixed reference 
point for expressing regularity, for example of the true solution. 

In our subsequent analysis, we will require that the true solution lives in the 
Cameron-Martin space of the prior X^, which in different choices of the prior (dif- 
ferent o"o) is a different space. Furthermore, we will assume that the true solution 
lives in X^ for some 7 > 1 and provide the convergence rate depending on the 
parameters 7,so,/3,^. The identification X'^ = X'^^'o and the intuitive relation be- 
tween the spaces X* and the Sobolev spaces, enable us to understand the meaning 
of the assumptions on the true solution. 

We can now formulate the following guidelines for applying the theory presented 
in the present paper: we work in a separable Hilbert space X with an orthonormal 
basis {V'fcjfceN and we have some prior knowledge about the true solution u"^ which 
can be expressed in terms of the spaces X^. The noise is assumed to be Gaussian 
AA(0,Ci), and the forward operator is known; that is, Ci and A~^ are known. We 
choose the prior AA(0, Co), that is, we choose the covariance operator Cq and we can 
find the value of sq. If the operator Cq is chosen to be diagonal in the basis {ipk}k£N 
then we can find the regularity of the true solution in terms of the spaces X* , that 
is, the value of 7 such that w G X"^ , and check that 7 > 1 which is necessary for 
our theory to work. We then find the values of /3 and i and calculate the value 
of A appearing in Assumption 2.4, checking that our choice of the prior is such 
that A > 2so. We now have all the necessary information required for applying the 
Corollaries 6.4 and 6.5 presented in Section 6 to get the rate of convergence. 
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Remark 2.6. Observe that in the above mentioned example of periodic func- 

tions, we have the identification X = H^'^o , thus since sq < 1 we have that the 
assumption w £ X^ implies that u' G H^, for t > i^. By the Sobolev embedding 
theorem [20, Theorem 5.31], this implies that the true solution is always assumed 
to be continuous. However, this is not a disadvantage of our method, since in many 
cases a Gaussian measure which charges L'^{D) with probability one, can be shown 
to also charge the space of continuous functions with probability one [23, Lemma 
6.25] 

3. Properties of the Posterior Mean and Covariance. We now make 
sense of the equation (1.7) weakly in the space X^, under the assumptions presented 
in the previous section. To do so, we define the operator Bx from (1.6) in X^ and 
examine its properties. In Section 4 we demonstrate that (1.4) and (1.7) do indeed 
correspond to the posterior covariance and mean. 

Consider the equation 

(3.1) Bxw = r, 

where 



Bx = A-'^C^^A-'^ + ACg 



Define the bihnear form B: X^ x X^ 



B{u,v) := (Ci M"^n,Ci M^^f) + \{Cq ^u,Cq ^v), yu,v G X^ . 

Definition 3.1. Let r G X~^ . An element w G X"^ is called a weak solution of 
(3.1), if 

B{w,v) = {r,v), \fv G X\ 

Proposition 3.2. Under the Assumptions 2.4(1) and (2), for any r G X~^ 
there exists a unique weak solution w G X^ of (3.1). 

Proof. We use the Lax-Milgram theorem in the Hilbert space X^, since r G 

X'^ = {x^y. 

i) i?: X-*^ X X^ — > R is coercive: 

B{u,u) = WC^'^A-^uf + \\\e(^'^uf > X\\u\\l, VuGX^ 

ii) B: X^ X X^ — t- M is continuous: indeed by the Cauchy-Schwarz inequality 
and the Assumptions 2.4(1) and (2), 

\B{u,v)\ < \\C^'^A-^u\\\\C^'^A'''^v\\ +X\\Cq^u\\\\CPv\\ 

< '^ll''^llfl-2f il^il/3-2f + '^ll''^lllil''^lll — '^'ll^llllhlll' '^U,V G X-^. 

D 
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Remark 3.3. The Lax-Milgram theorem defines a bounded operators : X~^ — )• 
X^ , such that B{Sr,v) = (r, w) for all v G X^ , which has a bounded inverse 
S^^ : X^ — )■ X^^ such that B{w,v) = (^S~^w,v'j for all v G X^ . Henceforward, 
we identify B\ = S^^ and B^ = S. Furthermore, note that in Proposition 3.2, 
Lemma 3.4 below, and the three propositions in Section 5, we only require A > 
and not the stronger assumption A > 2so. However, in all our other results we 
actually need A > 2so. 

Lemma 3.4. Suppose the Assumptions 2.4(1) and (2) hold. Then the operator 
S-^ = Bx:X^^ X~^ is identical to the operator A'^C^^A'^+XC^^ : X^ -^ X'^, 
where A^^C^ A^^ is defined weakly in X^~^^. 

Proof. The Lax-Milgram theorem imphes that B\ : X^ — )■ X~^ is bounded. 
Moreover, Cq : X^ — )• X~^ is bounded, thus the operator K := B\ — XCq : X^ — )■ 
X^^ is also bounded and satisfies 

(3.2) {Ku,v) = {C^^'A-^u,C^'^A-^v), yu,v G X^. 

Define A'^C^^A'^ weakly in X^-^'^, by the bilinear form A : X^^-^^ x X^-^^ -^ R 
given by 

A{u,v) = {C^'^A-^u,C^^A-\), Vn,f G X'^-^^. 

By Assumption 2.4(2), A is coercive and continuous in X^~ , thus by the Lax- 
Milgram theorem, there exists a uniquely defined, boundedly invertible, operator 
T : X'^^-l^ -^ X'^-'^^ such that A{u, v) = {T'^u, v) for all v G X^-^^ We identify 
A^^C^ A^^ with the bounded operator T^^ : X^^^''- — t- X"^^^^. By Assumption 
2.4(1) we have A > hence 

ll^^^Cf M"^u||_-^ < c\\A''^C{^A''^u\\^^_^ < c||u||^_2^ < c\\u\\^, \/u G X^, 

that is, A-^e^^A-^ : X^ -^ X'^ is bounded. By the definition of T^^ = A'^C^^A-^ 
and (3.2), this implies that K = Bx - XC^^ = A'^C^^A'^. D 

Proposition 3.5. Under the Assumptions 2.1, 2. 4(1), (2), (3), (6), there exists 
a unique weak solution, m G X^ of equation (1.7), v{du,dy) -almost surely. 

Proof. It suffices to show that A^^^C^ y G X^^ , v{du,dy)-alTaost surely. In- 
deed, by Lemma 2.5(iv) we have that y G X^ v{du, dy)-a.s. for all p < /3 — sq, thus 
by the Assumption 2.4(6) 

11/^2 /j-l/'-l, II < rllr^"'"^~^,,|| ^ r^ 

jjL/Q „^ L,-^ (/|| ^ I'll'-o y|| ^ ^^^1 

since 2/3 — 21 — 1 < P — sq, which holds by the Assumption 2.4(1). D 
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4. Posterior Identification. Suppose that in the problem (1.1) we have u ~ 
Ho = M{0,Co) and ^ ~ Af{0,Ci), where u is independent of ^. Then we have that 
y\u ~ P = Af{A~^u, ^Ci). Let /i^ be the posterior measure on u\y. 

In this section we prove a number of facts concerning the posterior measure /x^ 
for u\y. First, in Theorem 4.1 we prove that this measure has density with respect 
to the prior measure fio, identify this density and show that /i^ is Lipschitz in y, 
with respect to the Hellinger metric. Continuity in y will require the introduction 
of the space X''^^^~^^, to which u drawn from /ig belongs almost surely. Secondly, 
in Theorem 4.2, we show that ^^ is Gaussian and identify the covariance and mean 
via equations (1.4) and (1.7). This identification will form the basis for our analysis 
of posterior contraction in the following section. 

Theorem 4.1. Under the Assumptions 2.1, 2. 4(1), (2), (3), (4), (5), the posterior 
measure ^jfl is absolutely continuous with respect to //q and 

dijy 1 

(4.1) -^(^) = — -exp(-$(n,y)), 

djjo Z{y) 

where 

(4.2) ^u,y) := "^WC^'^A-'uf-niC^'^yxf'A-^) 

and Z{y) G (0, oo) is the normalizing constant. Furthermore, the map y i-^ fi'^ is 
Lipschitz continuous, with respect to the Hellinger metric: let s = sq + e, < 
e < (A — 2so) A (To; then there exists c = c{r) such that for all y,y' G Xf^~'^ with 

||y||^_„||y'||^_, <r 

dHcii(/i^,^^ ) < c\\y -y'\\^__,. 

Consequently, the fi^ -expectation of any polynomially hounded function 
f : x^~^^~ — )• E, where {E, \\-\\p) is a Banach space, is locally Lipschitz continuous 
in y. In particular, the posterior mean is locally Lipschitz continuous in y as a 
function Xl^-' -^ x"+^-2^ 

Theorem 4.2. Under the Assumptions 2.1, 2.4, the posterior measure ii^{du) 
is Gaussian /x^ = N{m,C), where C is given by (1-4) o-nd m is a weak solution of 
(1.7). 

The proofs of these two theorems are presented in the next two sections. Each 
proof is based on a series of lemmas. 

4.1. Proof of Theorem 4-1- In this subsection we prove Theorem 4.1. We first 
prove several useful estimates regarding $ defined in (4.2), for u G X'*"'"^"^^ and y G 
Xl^~^, where s G (so,l]. Observe that, under the Assumptions 2.1, 2. 4(1), (2), (3), 
for s = So + e where e > sufficiently small, the Lemma 2.5 implies on the one 
hand that u G X''^^"~ /io(c?^i)-almost surely and on the other hand that y G X"~^ 
h'{du, dy)- almost surely. 
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Lemma 4.3. Under the Assumptions 2.1, 2. 4(2), (4), (5), for any s G (sq, 1], the 
potential $ given by (4-2) satisfies: 

i) for every 6 > and r > 0, there exists an M = M{6, r) G M, such that for all 
u £ X^+l^-"^^ and all y G X'^"'* with \\y\\p^s ^ ^i 

$(n,y)>M-<5||n||^+^_2,; 

a) for every r > 0, there exists a K = K{r) > 0, such that for all u G x*~*"^~^^ 

and y G X'^"'' with ||n||^_^^_2^, ||y||^_5 < r, 

Hi) for every r > 0, there exists an L = L{r) > 0, such that for all ui,U2 G 

|$(ui,y) - ^(U2,y)| < L\\ui -M2|ls+/3-2£; 

iuj /or every 5 > and r > 0, i/iere exists an c = c{5, r) G M, smc/i i/iai /or a// 

|$(n,yi) - $(m, 2/2)1 < exp (^5||M||^+^_2f + cj ||yi - 2/2||/3-s- 

Proof. 

i) By first using the Cauchy-Schwarz inequality, then the Assumptions 2.4 (4) 
and (5), and then the Cauchy with 6' inequality for 5' > Q sufficiently small, 
we have 



^{u,y) = 5||Ci M-in||' -n(c|Ci '^y,C^ k, '^ A-\) 



2 

s _ 1 _£ _i 

> -n||CJCi 'y||||Co ^C^ M^^ujl > -cn||y||^_J|n||,+^_2^ 

4:5' 



> -^\\y\\fs-s - cn5'||M||,+^_2^ > M{r, 5) - 5||u||,+^_2^. 



ii) By the Cauchy-Schwarz inequality and the Assumptions 2. 4(2), (4) and (5), 
we have since s > so > 

Hu,y) < "^WC^'^A-^f + n\\clc^^'y\\\\CQk^^A-^u\\ 



,":ii,.ii2 

'2 



< c-\\u\\p_^^ + cn||y||^_J|M||^^^_2^ < K{r). 



iii) By first using the Assumptions 2.4 (4) and (5) and the triangle inequality, 
and then the Assumption 2.4(2) and the reverse triangle inequality, we have 
since s > sq > 

|$(wi,y)-$(M2,y)| = 



n 
2 



Ci M-^Mi|| - ||Ci M-^U2|| + 2(C|Ci 5y,Co ^Ci ^'A-\u2 - ui)) 
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< 



n 



2 /i-i. 



Ci M^'uill -||Ci M^'na 



+ cn||i/||^_J|ni -U2\\^+i3_2e 



< cn\\ui -U2\\ij_2e {\\ui\\i3_2e + IP2||^_2£) +cnr\\ui - n2||g_^_^_2£ 

<L{r)\\ui -M2||,+^_2£- 

iv) By first using the Cauchy-Schwarz inequality and then the Assumptions 2.4(4) 



and (5), we have 



(C|Ci^(yi-y2),Co^CiM-X 



<n||C|Ci5(yi-y2)||||Co^CiM- 
< cn\\yi -y2\\ 



s+l3-2e 



< exp [S\\u\ 



\u\ 



,+^_2£ + cl||yi-2/2||^_,. 



Corollary 4.4. Under the Assumptions 2.1, 2. 4(1), (2), (4), (5) 



a 



Z{y) := / exp{-<^{u,y))fio{du) > 0, 
Jx 

for all y G Xf^~'^,s = sq + e where < e < (A — 2so) A ctq. In particular, if in 
addition the Assumption 2.4(3) holds, then Z[y) > v-almost surely. 

Proof. Fix y G X^~'^ and set r = ||y||o_^. Gaussian measures on separa- 
ble Hilbert spaces are fuh [5, Proposition 1.25], hence since by Lemma 2.5(i) 
Ijlq{X^^I^~'^^) = 1, we have that ^Q{Bxs+ii--2i{r)) > 0. By Lemma 4.3(ii), there 
exists K{r) > such that 

exp(-$(u,y))//o(du) > / exp(-$(M,?/))/xo(<iu) 

> / exp{-K{r))fio{du) > 0. 

■JB^s+p^2e(r) 

Recalling that, under the additional Assumption 2.4(3), by Lemma 2.5(iv) we have 
y G X^~^ z^-almost surely for all s > sq, completes the proof. D 

We are now ready to prove Theorem 4.1: 

Proof of Theorem 4.1. Recahthat fo = ^o{dy)^fJ,o{du) and u = F{dy\u)fio{du) 
By the Cameron-Martin formula [3, Corollary 2.4.3], since by Lemma 2.5(ii) we have 

A^^u G V^C^ ^) //Q-a.s., we get for ^UQ-almost all u 



-{y\u) = exp(-$(u,y)). 
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thus we have for /XQ-almost all u 

-z—{y,u) = exp(-$(u,y)). 
duo 

By [11, Lemma 5.3] and Corollary 4.4 we have the relation (4.1). 
For the proof of the Lipschitz continuity of the posterior measure in y, with respect 
to the Hellinger distance, we apply [23, Theorem 4.2] for Y = X^-^ X = ^''+^-2^, 
using Lemma 4.3 and the fact that /j,o{X'^~^^~ ) = 1, by Lemma 2.5(i). D 

4.2. Proof of Theorem 4-2- We first give an overview of the proof of Theorem 
4.2. Let y\u ~ P = M(A^^u, ^Ci) and u ~ //q- Then by Proposition 3.5, there 
exists a unique weak solution, m G X^, of (1.7), z^((in, (iy)-almost surely. That is, 
with v{du, (iy)-probability equal to one, there exists an m = m{y) E X"^ such that 

B{m,v) = by{v), yveX^, 

where the bilinear form B is defined in Section 3, and 6^(f) = (A^^C^ y^v). In 
the following we show that /J' = Af{m,C), where 

The proof has the same structure as the proof for the identification of the posterior 
in [19]. We define the Gaussian measure J\f{m ,C ), which is the independent 
product of a measure identical to J\f{m, C) in the finite-dimensional space X^ 
spanned by the first A^ eigenfunctions of Co, and a measure identical to /^o in 
(Af ) . We next show that J\f{m ,C ) converges weakly to the measure /i^ which 
as a weak limit of Gaussian measures has to be Gaussian fi^ = J\f{fn,C), and we 
then identify fn and C with m, C respectively. 

Fix y drawn from u and let P^ be the orthogonal projection of X to the finite- 
dimensional space span{(j)i, ...,(f>]\i} := X^ , where as in Section 2, {4>k} 
orthonormal eigenbasis of Cq in X. Let Q^ = I — P^ ■ We define //^'^ by 

(4.3) i7(")=z^-P(-*''("'!')) 

where ^^{u,y) := ^{P'^u,y) and 



fc=l 



Z'^y):= / exp(-$^^(n,y))^o(dn). 
Jx 

Lemma 4.5. We have /x^'^ = M{m^,C^), where 

P^'C-'P^m^ = nP^A^'C^'y, 

and P^C^Q^ = Q^C^P^ = 0. 
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Proof. Let u e X^ . Since u = P^u we have by (4.3) 

d^^^y{P^u) oc exp {-^{P^u- y)) d^i^iP^u). 

The right hand side is A^- dimensional Gaussian with density proportional to the 
exponential of the following expression 

(4.4) - ^||C:M-1P^^||' + n(CPy,CrM-ip^n> - ^^Wc,"^ P^ u\\\ 

which by completing the square we can write as 

-\W)-Hu-rh^)f + c{yl 

where C is the covariance matrix and fh the mean. By equating with expression 
(4.4), we find that (C^)"! = p^Q-^pN ^nd {C^y^m^ = nP^^-^Cf ^y, thus on 
X^ we have that ^x^^y = Af{m'^,C^). On (X^)-^, the Radon-Nikodym derivative 
in (4.3) is equal to 1, hence fi^'y = /Uq = 7V(0, r^Co). D 

Proposition 4.6. Under the Assumptions 2.1, 2. 4(1), (2), (3), (4), (5), for all 
y G X^''"^ , s = So + e, where < e < (A — 2so) A gq, the measures /i^'^ converge 
weakly in X to /i^, where fj,y is defined in Theorem 4-1- In particular, ^i^^y converge 
weakly in X to fi^ v-almost surely. 

Proof. Fix y G Xl^~^. Let / : A" — ^ M be continuous and bounded. Then by 
(4.1), (4.3) and Lemma 2.5(i), we have that 

/(n)/x^'^(d^) = ^ / /(n)e-*"("'^)/xo(cin) 



L 



and 

f{u)^,y{du) = \ f /(n)e-*("'^)Mo(rf^) 

Let u G x^^^^'^^ and set ri = max{||ti||^^o_2^, ||y||fl_s} to get, by Lemma 4.3(iii), 
that ^^{u,y) — )• <!>(«, y), since ll-P^'^ll^ i fl„2^ — ll^lls+/3-2^ — ''i- ^V Lemma 4.3(i), 
for any 6 > 0, for r2 = ||y|L j,i there exists M{6,r2) G M such that 



f{u)e 



-'S>'^{u,y) 



< II ^11 g'5|i«||J^^_2f-M{(5,r2) y^ ^ X''^^' 



21 



where the right hand side is /iQ-integrable for 5 sufficiently small by the Fernique 
Theorem [3, Theorem 2.8.5]. Hence, by the Dominated Convergence Theorem, we 
have that f-^ f{u)/j.^'y{du) — )• j -^ f {u) ^^ {du) , as A^ — )• oo, where we get the con- 
vergence of the constants Z^ — )• Z by choosing / = 1. Thus we have n^^y =^ /.i^. 
Recalling, that y G X"~^ v-ahnosi surely completes the proof. D 

We are now ready to prove Theorem 4.2: 
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Proof of Theorem 4.2. By Proposition 4.6 we have that /i^'^ converge weakly 
in X to the measure //^, i^-almost surely. Since by Lemma 4.5, the measures // '^ 
are Gaussian, the limiting measure /i^ is also Gaussian. To see this we argue as 
follows. The weak convergence of measures implies the pointwise convergence of 
the Fourier transforms of the measures, thus by Levy's continuity theorem [13, 
Theorem 4.3] all the one dimensional projections of // '^, which are Gaussian, con- 
verge weakly to the corresponding one dimensional projections of fj,^. By the fact 
that the class of Gaussian distributions in M is closed under weak convergence [13, 
Chapter 4, Exercise 2], we get that all the one dimensional projections of the /x^ 
are Gaussian, thus ^^ is a Gaussian measure in X, n^ = J\f{m,C) for some m £ X 
and a self-adjoint, positive semi definite, trace class linear operator C. It suffices to 
show that m = m and C = C. 

We use the standard Galerkin method to show that m -^ m in X. Indeed, since 
by their definition m^ solve (1.7) in the A^- dimensional spaces X^ , for e = m—m^, 
we have that B{e, v) = 0, Vt; G X^ . By the coercivity and the continuity of B (see 
Proposition 3.2) 

||e||-^ < cB{e, e) = cB{e,'m — z) <c||e||-^||7n — z||^, Vz G X . 

Choose z = P^m to obtain 

m — 7TT- <Cm — P "T- i; 

where as N ^ oo the right hand side converges to zero since m £ X^. On the other 
hand, by [3, Example 3.8.15], we have that m -^ rn in X, hence we conclude that 
fn = m, as required. 

For the identification of the covariance operator, note that by the definition of 
C we have 

C^ = p^CP^ + {I- P^)Co{I - P^). 

Recall that {4'k}^i ^ire the eigenfunctions of Cq and fix A; G N. Then, for N > k 
and any w € X, we have that 

\{w,C^(pk) - {w,C^k)\ = \{w,{P'' - I)C(t>k)\ 

< WiP"" - I)w\\\\C^k\\, 

where the right hand side converges to zero as A^ — )• oo, since w £ X. This implies 
that C^4ik converges to C4ik weakly in A", as A^ — )• oo and this holds for any /c G N. 
On the other hand by [3, Example 3.8.15], we have that C (pk — ^ C^^ in X, as 
N —7- oo, for all A; G N. It follows that C(j)k = C4>ki for every k and since {4>k}'kLi is 
an orthonormal basis of X, we have that C = C. D 

5. Operator norm bounds on 13^ . The following propositions contain sev- 
eral operator norm estimates on the inverse of B\ and related quantities, and in 
particular estimates on the singular dependence of this operator as A — >■ 0. These 
are the key tools used in Section 6 to obtain posterior contraction results. In all of 
them we make use of the interpolation inequality in Hilbert scales, [7, Proposition 
8.19]. Recall that we consider Bx defined on X^, as explained in Remark 3.3. 
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Proposition 5.1. Let r]i = {I - 9i){l3 - 2i) + 0i, where 9i £ [0, 1]. Under the 
Assumptions 2. 4(1), (2) and (6) the following operator norm hounds hold: there is 
c > independent of 6i such that 

I I -I T T I I ^1 

II^A -^ '^l lli;(X2/3-2''-'Jl,X/5-2^) — '-^^ 

and 

\\"X -^ ^1 ll£(X2/3-2^-')i,Xl) — 



1+1 

2 , 



Proof. Let h e X^/^-s^-m = x^^^^i^. Then A'^C^^h G X'^. Indeed, by 
Assumption 2.4(6) for rj = 1, 

||Cn^~"^Cr /ill < cllCn /l|| = c||Cn ^ /l||=c||/l|L a<c||/i|L oa- 

\\ ^ -L M M u M M u M M lip — 1\ II lip — (7iZA 

By Proposition 3.2 for r = A^^C^ h, there exists a unique weak solution of (3.1), 
z G X^. By Definition 3.1, for v = z ^ X^, we get 



_1 r, —io !?L ')! 



-0 ^z- 
Using the Assumptions 2.4(2) and (6), and the Cauchy-Schwarz inequahty, we get 

II l|2 , \|| ||2 ^ Wr'^+^'f^h 

lrll/3-2£ + ^Irlli - '^Fo '^IIFII,,!- 

We interpolate the norm on z appearing on the right hand side between the norms 
on z appearing on the left hand side, then use the Cauchy with e inequality, and 
then Young's inequality for p = jz^,q = ^, to get successively, for c > a 
changing constant 

II l|2 , ^\\ ||2 / 11^^+^-/3, l-ei,_^/,i|| II \^i 

lrll/3-2£ + ^lrlli - "^Fo ^IIfII/3_2^'^ ^ (^^^ll^llij 



2e V '^ll j + y \^Fll;9-2^ (,^'Flli 

^ ^ f\-ei\\/-^+^-'^h\\A I ^ fn a \\\ l|2 , a \|| ||2 
-2^V " ^11 j + Yi(l-^i)Fll/3-2^ + ^i^Flli 

By choosing e > small enough we get, for c > independent of 9, A, 

II II -"ill ^+^-/3 II II II 9i+i II ni+e~i3 II 

||-z|L_2£ — '^'^ ^ Fo^ ^11 ^^^ Fill — '^'^ ^ Fo^ ^11" 

Replacing z = B^ A^^C^ h gives the result. D 

Proposition 5.2. Let 772 = (1 - 92){f3 - 2i) + 6*2, where 92 G [0,1]. (Jnder 
the Assum,ptions 2.4(1) and (2), the following operator norm hounds hold: there is 
c > independent of 62 such that 



and 



I"A "^0 ll£(X2-')2,X/5-2<) — '^'^ 



I -1 -111 fl2 + l 

I^A '^O I|£(X2-12,X1) - '^'^ ^ • 
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Proof. Let h £ X^-'^a. Then C^^h G X''^, since r]2 < 1. By Proposition 3.2 for 
r = Cq h, there exists a unique weak solution of (3.1), z G X^. By Definition 3.1, 
for V = z £ X^, we have 

\\C^^A-^z\\^ + X\\Cq^z\\={Cq' h,CQ ^ z), 
and by the Assumption 2.4(2) and the Cauchy-Schwarz inequahty, we get 



U; 



-I- A r < rWr 2 h r 

1/3-2^ + ^lrlli - ^\ro "-IIFIIr^a' 



We interpolate the norm on z appearing on the right hand side between the norms 
on z appearing on the left hand side to get as in the proof of Proposition 5.1, for 
c > independent of 9, A, 

||-z|L_2^ ^ cA ^||Cq^ ^11 and ||-z||-^ < cA 2 1 1 Cq^ /i||. 

Replacing z = B^ Cq h gives the result. D 

Proposition 5.3. Let rj^ = {1 - 6'3)(/3 - 2£ - s) + 6^3(1 - s), where 63 e [0, 1] 
and s G (so, 1], where sq £ [0, 1) as defined in Lemma 2.2. Under the Assumptions 
2.4(1) and (2), the following norm hounds hold: there is c > independent of 63 
such that 

/7 2ia-l(-> 2 < r\~2 

and 

Wf iK-'-r 2 < rA~ 2^ 



Furthermore, 



-0 '-'A "^0 \\C(X) 



ICo 5i3-iCo 5 IL,^, < cA-^^, Vs G ({/? - 21} V so, 1]. 



Proof. Let h e X''^^ = x^^-'^'^^^+'-^ . Then h G X^-\ since A > 0, thus 

__ s_ s_ 

Cq ^h £ X^^. By Proposition 3.2 for r = Cq ^h, there exists a unique weak solution 

of (3.1), z' G X^. Since for u G X^~^ we have that Cq^u G X^, we conclude that for 

any v G X-'^^'^ 

{cf^A-'clz,c;''A-^clv) + X{C^z,C^v) = {Cq'^KcIv), 

s_ 

where z = Cq '^ z' £ X"^^^. Choosing v = z £ X^^**, we get 

\\C^^^ A-^cl zf + \\\Cq^ zf = {h,z). 
By the Assumption 2.4(2) and the Cauchy-Schwarz inequality, we have 

II l|2 II ||2 II II II 11 

II \\l3-2e-s II lll-s — II W-TisW \\r]3 
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We interpolate the norm of z appearing on the right hand side between the norms 
of z appearing on the left hand side, to get as in the proof of Proposition 5.1, for 
c > independent of ^3, A and s 



5a II , II .1111 , , _^ 



-1 



U L o» < cA 2 /i and U L < cA 2 /i 

II \\fi~2i-s — II ll-»?3 II lll-s — II ll-r?3 

Replacing z = Cq '^B^ Cq ^/i gives the first two rates. 

For the last claim, note that we can always choose {(3 — 2i} V {sq} < s < 1, since 
So < 1 and A > 0. Using the first two estimates, for 1739 = (1 — ^30) (/3 — 2i — s) + 
^30(1 — s) = 0, that is ^30 = ~^ € [0, 1], we have that 



and 



I — - 1 —-II "30 

\n 2 n-lr 2 ^ r\^2 

ro *-'a "^0 \\c{XP''^'^-',X) — ^^ 



\(-> 2K?-l/i 2 < r\^2 

r^o "a "^0 \\c{X'^-=,X) - "-^ ■ 



Let u £ X. Then, for any t > 0, we have the decomposition 

00 
n = ^nfc(/)jt= ^ Uk(t)k+ ^ Uk(t)k='-u + u, 
fc=i A-i<t A-i>t 

where {0fc}^i are the eigenfunctions of Cq and Uk '■= {u, (pk)- Since 1 — s > and 
(3 — 2i — s < 0, we have 

, 930 + 1 II II ,_M||_|| 

<cX 2 ||^||^_^ + cA 2 \\u\\^_^^_^ 

^ ■ ^30 + 1 1 „|| II ._^,fl_2/-sll II 

<cX -2 V ^\\u\\ +cX 2 r ^^ ^\u\. 

The first term on the right hand side is increasing in t, while the second is decreas- 
ing, so we can optimize by choosing t = t(A) making the two terms equal, that is 
t = A2A^ to obtain the claimed rate. D 

6. Posterior Contraction. In this section we employ the developments of the 
preceding sections to study the posterior consistency of the Bayesian solution to 
the inverse problem. That is, we consider a family of data sets y^ = y^{n) given by 

(1.10) and study the limiting behavior of the posterior measure /^^„ = M{m\,C) 
as n — 7- 00. Intuitively we would hope to recover a measure which concentrates near 



k ""fc 
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the true solution u^ in this Hmit. Following the approach in [14], [9], [24] and [8], 
we quantify this idea as in (1.12). By the Markov inequality we have 



„t 



i,t f M +11 . , , 1 . 1 



E^' uK'.. [u : \\u - nt|| > M„e„} < j^^^'^' / \\u - u^ffif^Jdu), 



so that it suffices to show that 

(6.1) E^^ f \\u - um'^n^^Jdu) < eel 

In addition to n~^, there is a second small parameter in the problem, namely the 
regularization parameter, A, and we will choose a relationship between n and A in 
order to optimize the convergence rates En- We will show that determination of 
optimal convergence rates follows directly from the operator norm bounds on B^ 
derived in the previous section, which concern only A dependence; relating n to 
A then follows as a trivial optimization. Thus, the A dependence of the operator 
norm bounds in the previous section forms the heart of the posterior contraction 
analysis. 

We now present our convergence results. In Theorem 6.1 and Corollary 6.4 we 
study the convergence of the posterior mean to the true solution in a range of norms, 
while in Theorem 6.2 and Corollary 6.5 we study the concentration of the posterior 
near the true solution as described in (1.12). The proofs of the two theorems are 
provided later in the current section. 

Theorem 6.1. Letv) G X^. Under the Assumptions 2.1 and 2.4, we have that, 

92-91-1 

for the choice r = T{n) = n^('>i-<>2+'^) and for any 9 G [0, 1] 

til + +ii2 e+e2~'2 

W \\m\ -v)\\ < cn^i-^a+a 

II A Mr; — ' 

where T] = {l — 9){(3 — 2i) + 9. The result holds for any 9i., 92 ^ [O;!]; chosen so that 
E(k2) < oo, for K = max|||^||2^_2^_^^,||ut||2_^J, where rji = (1 - 0i){(3 - 2i) + 
9i, i = l,2. 

Theorem 6.2. Let u^ G X^ . Under the Assumptions 2.1 and 2.4, we have that, 
62-61-1 
for T = T{n) = n2(^i~^2+2) ^ the convergence in (1.12) holds with 

ce, -e.,+2) a. — ) a ■> ''J P ^-t ^ u 



e„ = n2(6i-62+2), 



70 



A ' 
0, otherwise. 



The result holds for any 9i,62 G [0, 1], chosen so that E(k^) < cxd, for 

K = max|||^||2^_2^_^^,||ut||^_^^|, where rji = (1 - 9i){f3 - 2i) + 6'i, i = 1,2. 

Remark 6.3. i) In order to have convergence in the PDE method we need 



|2 
I2- 

X'^ , then we need to have 7>2 — r/2 = l + (l — ^2) A for some 92 G [0, 1]. 



^11^ I 2- < 00 for a 92 < 1. If we have the a-priori information that u' G 
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This means that the minimum requirement for convergence is 7 > 1 which is 
compatible to our assumption u^ e X^ . On the other hand, since in order to 
have the optimal rate (which corresponds to choosing 62 as small as possible) 
we need to choose 62 = — a~~^' ^^ T ^ 1 + ^ then the right hand side is 
negative so we have to choose 62 = 0, hence we cannot achieve the optimal 
rate. We say that the method saturates at 7 = 1 + A which reflects the fact 
that the true solution has more regularity than the method allows us to exploit 
in order to obtain faster convergence rates. 

a) In order to have convergence we also needE L^ „_ „_ < 00 for a9i < 1. By 
Lemma 2. 5 (Hi), it suffices to have 61 > ^. This means that we need A > sq, 
which holds by the Assumption 2.4(1), in order to be able to choose 9i < 1. 
On the other hand, since A > and ctq > 1, we have that ^ > thus we can 
always choose 9i in an optimal way, that is, we can always choose 9i = ^^^ 
where e > is arbitrarily small. 

Hi) If we want draws from fiQ to be in X"' then by Lemma 2.2(ii) we need ctq > 7. 
Since the minimum requirement for the method to give convergence is 7 > 1 
while o"o < 1 this means that we can never have draws exactly matching the 
regularity of the prior. On the other hand if we want an undersmoothing prior 
(which according to [I4] in the diagonal case gives asymptotic coverage equal 
to 1) we need ctq < 7, which we always have since 7 > 1 and cq < 1. This, as 
discussed in Section 1, gives an explanation to the observation that in both of 
the above theorems we always have r — )• as n —)• 00. 

iv) When (3 — 2i > 0, in Theorem 6.2 and in Corollary 6.5 below, we get subopti- 
mal rates. The reason is that our analysis to obtain the error in the X-norm 
is based on interpolating between the error in the X^~'^^-norm and the error 
in the X^-norm. When p — 2i > 0, interpolation is not possible since the X- 
norm is now weaker than the X^~'^^-norm. However, we can at least bound 
the error in the X-norm by the error in the X"~ -norm, thus obtaining a 
suboptimal rate. Note, that the case f3 — 2i > does not necessarily corre- 
spond to the well posed case: by Lemma 2.5 we can only guarantee that a 
draw from the noise distribution lives in X^, p < j3 — sq, while the range of 
A~'^ is formally X . Hence, in order to have a well posed problem we need 
(3 — Sq > 2£, or equivalently A < (Tq. This can happen despite our assumption 
A > 2so, when sq < 1/3 and for appropriate choice of i and (3. In this case, 
regularization is unnecessary. 

The following two corollaries are a direct consequence of the last remark: 

Corollary 6.4. Assume u"!" G X'^' , where 7 > 1 and let rj = {I - 6){I3 - 2£) + 
6, where 6 G [0,1]. Under the Assumptions 2.1 and 2.4, we have the following 
optimized rates of convergence, where e > is arbitrarily small: 

i) if "1 ^ (1, A + 1], for T = T{n 





7-l+S()+e 


r = T{n) 


= ^ 2(A+7-l + a„+£) 


E^^ m.{ 


o A+7-l-eA 
- U^ < Cn A+7-l+so+e 

V — 
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A+ap+e 

a) i/7 > A + 1, for T = T{n) = n 2(2a+so+s) 

ti, + +1,9 ___(2-e)A_ 

E^ \\m[ - um < en ^^+"0+^ : 

II A 11-^ — ' 

Hi) if-f = l and 9 G [0, 1) for r = T{n) = n ^(A+sQ+e) 

ti, + +1,2 {i~e)A 

E^ \\m\ -um < cn a+7-i+.o+= . 

II A llry — 

7/7=1 and ^ = 1 t/ien i/ie method does not give convergenee; 

Corollary 6.5. Assume u"^ € X"^ , where 7 > 1. Under the Assumptions 2.1 
and 2.4, we have the following optimized rates for the convergence in (1.12), where 
£ > is arbitrarily small: 

i) if-f G [1,A + 1] forr = T{n) = n"2(^+7-i+»o+^) 



n 2{A+7-i+so+^) ^ if ^ -21 <0 

A+7-1 
n 2{A+7-i+soe) otherwise: 



A+SQ+e 

ii^ i/7 > A + 1 for T = T{n) = n 2(2A+so+e) 



_ I n 2(2A+«o+=) , if p-2i<0 

£n — \ A 

[ n 2A+so+e ^ otherwise. 

Note that, since the posterior is Gaussian, the left hand side in (6.1) is the Square 
Posterior Contraction 

(6.2) SPC = Ey^\\m{-u^f + tr{Cx,n), 

which is the sum of the mean integrated squared error (MISE) and the posterior 
spread. Let u' G X^. By Lemma 3.4, the relationship (1.10) between w and y' and 
the equation (1.11) for mJ^, we obtain 



n 



and Bxu^ = A'^C-^A'^u^ + XCqu\ 

where the equations hold in X~^, since by a similar argument to the proof of 
Proposition 3.5 we have ml G X^. By subtraction we get 



jn 
Therefore 



Bx{ml - nt) = -^A-^C^^C - XCq\^ . 



(6.3) m{-u^ = Bl^ (^A-^C^^i - XCq^u^ 
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as an equation in X^. Using the fact that the noise has mean zero and the relation 
(1.6), equation (6.3) impUes that we can spUt the square posterior contraction into 
three terms 

(6.4) SPC= \\XB^^Cq\^\^ + E\\^B^^A-^C^^^\\^ + -tr{B-^), 

provided the right hand side is finite. A consequence of the proof of Theorem 4.2 
is that B~^ is trace class. Note that for C, a white noise, we have that 

In ^ . s s ^ ss 



tr(S-^) = E||^, ^Cir = IE(C,Sa C> = W-CCo ^^A Co 'C^O 

< llr^^/3-i(^^2 II Fllr^A||2 



II ^ l|2 

which for s > sq since by Lemma 2.2 we have that E CqO < oo, provides the 
bound 

(6.5) tr(S^i)<c||Co"^^^iCo"^||^(^), 

where c > is independent of A. If g, r are chosen sufficiently large so that 

||Cq '^u^\\ < oo and E||Cg^|| < oo then we see that 

(6.6) 

Qpri < r I \\\K~^r''^~^\\'^ -I \\k~^ A~^r'~^r'~^\\'^ ^ llr~2K-i/'~2|| ) 

Or-O ^ C I ||AO;^ L.Q \\c{X)^ ^\\"\ -^ "^l "^0 \\C{X)^ ^\rO "\ ^0 \\c{X))' 

where c > is independent of A and n. Thus identifying e„ in (1.12) can be achieved 
simply through properties of the inverse of Bx and its parametric dependence on 
A. 

In the following, we are going to study convergence rates for the square posterior 
contraction, (6.4), which by the previous analysis will secure that 

for e^ —7- at a rate almost as fast as the square posterior contraction. This suggests 
that the error is determined by the MISE and the trace of the posterior covariance, 
thus we optimize our analysis with respect to these two quantities. In [14] the 
situation where Co,Ci and A are diagonalizable in the same eigenbasis is studied, 
and it is shown that the third term in equation (6.4) is bounded by the second 
term in terms of their parametric dependence on A. The same idea is used in the 
proof of Theorem 6.2. 

We now provide the proofs of Theorem 6.1 and Theorem 6.2. 

Proof of Theorem 6.1. Since ^ has zero mean, we have by (6.3) 

,t_„,t||2 - A^IIS-V-S/tlP +-E\\B-'^A~'^C-^ff 
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and 



n 
Using Propositions 5.1 and 5.2, we get 



W A Mi MA U Mi *illA l^Mi 



and 

M A Ml — \ ^ \ „ / >, ^ ' 



Since the common parenthesis term, consists of a decreasing and an increasing 
term in r, we optimize the rate by choosing r = T{n) = n^ such that the two terms 
become equal, that is, p = wg-^Q—r^- We obtain, 

Ellmt - u^\l ^, < cE(K2)n^i-^2+2 and Ellmt - u^\'^, < cEfK^ln^i-^a+a . 

II A lip — Z£ — ^ '' IIA 111 — v/ 

By interpolating between the two last estimates we obtain the claimed rate. D 
Proof of Theorem 6.2. Recall equation (6.4) 

SPC = WXBV^Cn^u^f + E\\^BV^A-^C7^^f + -ti(B-^). 
II A u II II /n ^ -^11 n ^ 



The idea is that the third term is always dominated by the second term. Combining 
equation (6.5) with Proposition 5.3, we have that 

-tr(^r^) <c-A-^^^, VsG ({/3-2n V{so},l]. 
n n 

i) Suppose (3 — 2i < 0, so that by interpolating between the rates provided by 
Proposition 5.1 and 5.2 we get for ^o = -^^ G [0, 1] 






and 

\\xBv%\m' < c\\umi A^-^^-'^o. 

II A U 11 — 11 112— -^2 

II 1 1 2 

Note that 6i is chosen so that E ^ „_ ._ < cx), that is, by Lemma 2.5(iii), 
it suffices to have ^i > ^. Noticing that by choosing s arbitrarily close to sq, 
we can have ^^=£+^ arbitrarily close to ^^"^+^0 , and since 9i + 9o> 2^-^+so ^ 
we deduce that the third term in equation (6.4) is always dominated by the 
second term. Combining, we have that 

SPC < ^lifl(^X^-e2 + ^X-'^) = ^^(^^2-2^2^2-4 ^ ^6,-1^29,^^ 
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ii) Suppose 13 — 2i > 0. Using the Propositions 5.1 and 5.2 we have that 

W \ r>—l ri — l tl|2 ^ ll\»o— 1^ — 1 tl|2 , II +||2 \'2—f)', 

II A yj W — M A U Wp — Zt — M Mz — 7)2 



n " " \/n lip zz 



e\\^b^^a-^c^^^\\ < ce\\^b^^a-^c^'^c 



and 



where as before ^i > ^. The third term in equation (6.4) is again dominated 
by the second term, since on the one hand ^i > ^ and on the other hand, 
since /3 - 2£ > 0, we can always choose {/3 - 2^} V {sq} < s < lA{so + /3-2£} 
to get ~^ < ^- Combining the three estimates we have that 
SPC < cE(A.2)(n^2-2^2e2-4 ^ ne,-i^2e,y 

In both cases, the common term in the parenthesis consists of a decreasing and an 
increasing term in r, thus we can optimize by choosing r = T(n) = n^ making the 
two terms equal, that is, p = 26-^ 1 20"^- 4 ' ^° S^t the claimed rates. D 

7. Example. In this section we present a nontrivial example satisfying As- 
sumptions 2.1 and 2.4. 

Let ricM'^, (i = l,2,3bea bounded and open set. We define ^0 '■= —^: where 
A is the Dirichlet Laplacian which is the Friedrichs extension of the classical Lapla- 
cian defined on Cq^Q), that is, ^0 is a self-adjoint operator with a domain V^Aq) 
dense in L^{n) [15]. For dQ sufficiently smooth we have T>{Ao) = H^iil.) D H^i^). 
It is well known that ^0 has a compact inverse and that it possesses an eigensystem 
{/'L^fclfcLi) where the eigenfunctions {e^} form a complete orthonormal basis of 
L^(r2) and the eigenvalues pf, behave asymptotically like kd [1]. 

We study the Bayesian inversion of the operator A := Ao+Aiq where Aiq : L'^{Q) - 
L^(0) is the multiplication operator by a nonnegative function q E W'^'^{Q). Note 
that by the Holder inequality the operator A^^ is bounded. We assume that the ob- 
servational noise is white, so that Ci = I, and we set the prior covariance operator 
to be Co = A^ . 

The operator Co is trace class. Indeed, let A| = p^ be its eigenvalues. Then they 
behave asymptotically like k~d and Xlfc^i^^^ < '^^ ^^^ d < 4. Furthermore, we 

have that YlT=i ^k ^ — '^YlT=i ^ ^ < ^^^ provided a < 1 — ^, that is, the 
Assumption 2.1 is satisfied with 



o-o 




We define the Hilbert scale induced by Co = .4q , that is, (X'^)sgK, for X* 

7W"'"s, where 

00 
M= f] ^Mo^), {u,v)^ := (^o'"'-^o^') and H-uH^ := H^o""!!- 

A:=0 
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Observe, X° = X = L^{n). 

Our aim is to show that Ci ~ Cq and A~^ ~ Cg, where /3 = and £ = ^, in the 
sense of the Assumptions 2.4. We have A = 2i — P + 1 = 2. Since for d = 1, 2, 3 
we have < sq < 1, the Assumption 2.4(1) is satisfied. Moreover, note that since 
Ci = I the Assumptions 2.4(3) and (4) are trivially satisfied. 

We now show that Assumptions 2.4 (2), (5), (6) are also satisfied. In this example 
the three assumptions have the form 

2. (A + A^g)-in|| X ll^o^ujl, VnGX-i; 

5. A'o{Ao + Mg)-^u\\ <c3||^^-^ii||, yueX'-\ VsG (so,l]; 

6. Ao^^iAo + Mq^^uW < C4||^^''"^n||, VueX-^'-i, Vr? e [-1,1]. 

Observe that Assumption (5) is implied by Assumption (6). 

Lemma 7.1. The operator {Aq + Aiq)~^Ao is bounded when considered as an 
operator (i) X ^ X , (ii) X^^ — ^ X^^ and (Hi) X^ ^>- X^ . 

Proof. 

i) {Aq + Mq^^Ao = (/ + AQ^Mq)^"", where Aq^ : X ^ X \s compact and 
Aiq : X ^ X \s bounded thus K := Aq M.q : ^ — )• ^ is compact. Hence, by 
the Fredholm Alternative [12, §27, Theorem 9], it suffices to show that —1 is 
not an eigenvalue of K. Indeed, if there exists u ^ X such that Aq M.qU = 
—u, then AiqU = — ^o^) therefore u satisfies {Ao + M.q)u = 0. Since Ao + Mq 
is positive-definite we have that u = 0, thus —1 is not an eigenvalue of K. 

ii) The claim is equivalent to the inequality 

||^(7^(A + A^g)"^^i|| < c||^o^n||, V-u G X'^. 

Put V = {Aq + ^Aq)~^u and note that we want to estimate w = Aq v, where 
V satisfies ^o'^ + -M-qV = u, or multiplying by Aq , w + Aq AdqAow = Aq u. 
The operator K = Aq MqA^ is compact in X . Indeed, since Aq is compact, 
it suffices to show that ^q" AAqA^ is bounded in X. By duality for n G Af we 
have 

||-4,^^A^g^0'"|| = sup (AQ^MqAQU,4))= sup (u.AQMqAQ^cf)) 

11011=1 m=-^ 

< \\u\\ sup ||A-Mg-4o V|| < c||n||||g|U2_^.„., 

Il<^ll=l ^ ' 

since 

\\AoMqAQ^(l)\\ = ||AA^,^^V|| = ||(A(7)^o V + 2(Vg)(V^(7V) + '7A^o VII 

< |k||vi/2,-(n)(||AV|| + ||V^c^V|| + ll-^ID < c||g||^2,oo(J^)||'/'||- 

Note that —1 cannot be an eigenvalue of K since in that case we would have 
^0 -MqAou = —u for some u € X \ {0} thus AiqAou = —A^u and setting 
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ijj = Aqu we would get Aoip+M.gip = thus since Ao+M.q is positive-definite 
tjj = and since ^o is positive-definite u = 0. By tlie Fredfiolm Alternative we 
have that {I + K)~^ is bounded in ^, hence \\w\\ < ||(/ + i^)~-^|L.^J|^Q u\\ 
or equivalently 

iii) The claim is equivalent to the inequality 

||ylo(A + Mq^'^uW < c\\u\\, Vm e X. 

Put V = {Aq + A4q)^^u and note that we want to estimate w = Aqv, where 
V satisfies ^o^' + -MqV = u ov equivalently w + ^AqAQ w = u. The operator 
K = ^AqAQ is compact in X, since ^q is compact and A^g is bounded 
in X. Note that —1 cannot be an eigenvalue of K since if there exists a 
u € X \ {0} such that AdqA^ u = —u, then setting ^q u = tp we have that 
Aiqip = —Aoip thus since ^o + -Mq is positive-definite we have tp = and 
since Aq is positive-definite we have u = 0. By the Fredholm Alternative we 
conclude that {I + K)~^ is bounded in A", hence \\w\\ < \\{I + K)^^\\^,yA\u\\ 

II II II \\L-{/C ) 

or equivalently that 

||A(A + Mqy^u\\ < cll-ull, Vu G X. 

D 

By direct application of [17, Theorem 4.36 and Theorem 1.18] we have the fol- 
lowing interpolation result: 

Proposition 7.2. The couple {X^,X^) is an interpolation couple and for every 
6 G [0,1] we have {X^,X^)0^2 = X . Furthermore, the couple {X^,X~^) is also an 
interpolation couple and for every 6 G [0,1] we have {X'^ ,X^^)0^2 = X^^ . 

Proposition 7.3. The Assumptions 2.4 are satisfied for this example. 

Proof. We need to show that the Assumptions 2.4(2) and (6) hold, 
i) We first prove Assumption 2.4(2). By (i) of the last lemma we have 

||(A + -M<;)"^n|| < c||^o^u||, \/u G X'^. 

We need to show that ||(^o + -^g)~^'"|| ^ c||^(^^ii||, Vn G X^^ which is 
equivalent to ||^q (^o + -A4g)ii|| < c||u||, \/u G X. Indeed, 

\\AQ'^{Ao+Mq)u\\ < \\u\\ + \\AQ\qu)\\ < (l+||g||^)||M||, 

since Aq is bounded and by the Holder inequality, 
ii) For the proof of Assumption 2.4(6) it suffices to show that L := (Aq + 
Mq)~^Ao G £(X'?), \/ri G [-1, 1]. By Lemma 7.1 we have that L G C{X^) D 
C{X^) and L G C{X^) n C{X-^), thus by [17][Theorem 1.6] and the last 
proposition, we get the result. 
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D 

We can now apply Corollary 6.4 and Corollary 6.5 to get the following conver- 
gence result. 

4— d— 47+g 

Theorem 7.4. Let u^ e X'^,^ > 1. Then, for r = T{n) = n87+8+2d+£^ the 
convergence in (1.12) holds with e„ = n~^ , where 

27 



A+d+A-y+e ' ^/ 7 < 3 

6 
16+d+£' 



' ^/ 7 > 3, 



/or e > arbitrarily small and where d = 1,2,3, is the dimension. Furthermore, 
for t G [—1, 1], for the same choice of t, we have ¥)\m\ — n^ < cn~^, where 



4+d+47+e 

12-4i 
16+d+e ' 



i/7>3. 



8. The Diagonal Case. In the case where Co, C\ and A, are all diagonalizable 
in the same eigenbasis our assumptions are trivially satisfied, provided A > 2so. 
In [14], sharp convergence rates are obtained for the convergence in (1.12), in the 
case where the three relevant operators are simultaneously diagonalizable and have 
spectra that decay algebraically; the authors only consider the case C\ = I since in 
this diagonal setting the colored noise problem can be reduced to the white noise 
one. The rates in [14] agree with the minimax rates provided the scaling of the 
prior is optimally chosen, [4]. In Figure 1 (cf. Section 1) we have in green the rates 
of convergence predicted by Corollary 6.5 and in blue the sharp convergence rates 
from [14], plotted against the regularity of the true solution, i^ G X"^ , in the case 
where fi = I = ^ and Co has eigenvalues that decay like /c~^. In this case sq = ^ 
and A = |, so that A > 2so. 

As explained in Remark 6.3, the minimum regularity for our method to work is 
7 = 1 and our rates saturate at 7 = 1 + A, that is, in this example at 7 = 2.5. We 
note that for 7 G [1,2.5] our rates agree, up to e > arbitrarily small, with the 
sharp rates obtained in [14], for 7 > 2.5 our rates are suboptimal and for 7 < 1 
the method fails. In [14], the convergence rates are obtained for 7 > and the 
saturation point is at 7 = 2A, that is, in this example at 7 = 3. In general the 
PDE method can saturate earlier (if 2i — /3 < 0), at the same time (if 2i — (3 = 0), or 
later (if 2£ — /3 > 0) compared to the diagonal method presented in [14]. However, 
the case 2£ — /3 > in which our method saturates later, is also the case in which 
our rates are suboptimal, as explained in Remark 6.3(iv). 

The discrepancies can be explained by the fact that in Proposition 5.2, the choice 
of 02 which determines both the minimum requirement on the regularity of u' and 
the saturation point, is the same for both of the operator norm bounds. This means 
that on the one hand to get convergence of the term ||A;B;^ Cq u'\\ in equation (6.4) 
in the proof of Theorem 6.2, we require conditions which secure the convergence 
in the stronger X^-norm and on the other hand the saturation rate for this term 
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is the same as the saturation rate in the weaker X''~^^-norni. For example, when 
/3 — 2i = the saturation rate in the PDE method is the rate of the Af-norm hence 
the agreement of the saturation point with the rates in [14]. In particular, we have 
agreement of the saturation rate when /3 = i = 0, which corresponds to the problem 
where we directly observe the unknown function polluted by white noise (termed 
the white noise model). 

9. Conclusions. We have presented a new method of identifying the poste- 
rior distribution in a conjugate Gaussian Bayesian linear inverse problem setting 
(Section 4). We used this identification to examine the posterior consistency of 
the Bayesian approach in a frequentist sense (Section 6). We provided convergence 
rates for the convergence of the expectation of the mean error in a range of norms 
(Theorem 6.1, Corollary 6.4). We also provided convergence rates for the square 
posterior contraction (Theorem 6.2, Corollary 6.5). Our methodology assumed a 
relation between the prior covariance, the noise covariance and the forward oper- 
ator, expressed in the form of norm equivalence relations (Assumptions 2.4). We 
considered Gaussian noise which can be white. In order for our methods to work 
we required a certain degree of ill-posedness compared to the regularity of the prior 
(Assumption 2.4(1)) and for the convergence rates to be valid a certain degree of 
regularity of the true solution. In the case where the three involved operators are 
all diagonalizable in the same eigenbasis, when the problem is sufficiently ill-posed 
and for a range of values of 7, the parameter expressing the regularity of the true 
solution, our rates agree (up to e > arbitrarily small) with the sharp (minimax) 
convergence rates obtained in [14] (Section 8). 

The methodology presented in this paper is extended to drift estimation for 
diffusion processes in [19]. Future research includes the extension to an abstract 
setting which includes both the present paper and [19] as special cases. Other 
possible directions are the consideration of nonlinear inverse problems, the use 
of non-Gaussian priors and/or noise and the extension of the credibility analysis 
presented in [14] to a more general setting. 
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